Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 16, 2022

What changes were proposed in this pull request?

In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix _LEGACY_ERROR_TEMP_. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.

Why are the changes needed?

The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes.

Does this PR introduce any user-facing change?

No. The error messages should be almost the same by default.

How was this patch tested?

By running the modified test suites:

$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "test:testOnly *ExpressionParserSuite"

@MaxGekk MaxGekk changed the title [WIP][SQL] Migrate parsing errors onto error classes [WIP][SPARK-40473][SQL] Migrate parsing errors onto error classes Sep 16, 2022
@MaxGekk MaxGekk changed the title [WIP][SPARK-40473][SQL] Migrate parsing errors onto error classes [SPARK-40473][SQL] Migrate parsing errors onto error classes Sep 16, 2022
@MaxGekk MaxGekk marked this pull request as ready for review September 16, 2022 14:57
@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 16, 2022

cc @srielau @anchovYu Could you take a look at the PR, please.

Copy link
Contributor

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool!
I have some general questions, observations:

  • You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"
  • How do we keep the numbering straight for the next n PRs? I assume you used some tooling to whip this up so fast. So this was 0 => 1. How do we get from n => n + 1?


/**/
{
"errorClass" : "_LEGACY_ERROR_TEMP_055"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we loose the context?

Copy link
Member Author

@MaxGekk MaxGekk Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't provide context in many places (I mean QueryContext not line, pos). This is one of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, users will see the same error message by default (not JSON).

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 16, 2022

You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"

Some time ago, I have counted the total number of exceptions to be ported onto error classes around 800. I don't see any problems to start using 4 digits after 999.

How do we keep the numbering straight for the next n PRs?

We have 3 types of errors, actually: parsing, compilation and execution. We could migrate them 1-by-1 using sequential numbers. Otherwise the PR will conflict a lot.

I assume you used some tooling to whip this up so fast.

I think it will be faster to manually port the exceptions without any tooling.

} catch {
case e: NumberFormatException =>
throw new ParseException(e.getMessage, ctx)
throw new ParseException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention it in readme.md or somewhere: we don't have to instantiate the exception instance in a central file if it's only used once. This can also simplify the error migration work.

@cloud-fan
Copy link
Contributor

Can we update core/src/main/resources/error/README.md to mention this special error class naming prefix?

@github-actions github-actions bot added the DOCS label Sep 19, 2022
4. Check if the exception type already extends `SparkThrowable`.
If true, skip to step 6.
5. Mix `SparkThrowable` into the exception.
5. Mix `SparkThrowable` into the exception, and place it to Query.*Errors at least when it is invoked more than once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change item 6 instead

Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create a util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cloud-fan
Copy link
Contributor

@dongjoon-hyun @viirya @HyukjinKwon The new error class name prefix _LEGACY_ERROR_TEMP_ proposed here kind of marks the error as developer-facing, not user-facing. Developers can still get the error class programmatically via the SparkThrowable interface, so that they can build error infra with it. End users won't see the error class in the message. This allows us to do the error migration very quickly, and we can refine the error classes and mark them as user-facing later (naming them properly, adding tests, etc.). What do you think?


Error classes are a succinct, human-readable representation of the error category.

An uncategorized errors can be assigned to a legacy error class with the prefix _LEGACY_ERROR_TEMP_ and an unused sequential number, for instance _LEGACY_ERROR_TEMP_053.
Copy link
Contributor

@itholic itholic Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we should using \ in front of _LEGACY_ERROR_TEMP_ as an escape character ??

Otherwise, maybe it should look something like: LEGACY_ERROR_TEMP in the documents, which is missing leading and following underscore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mark it as code. Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 21, 2022

@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang @dongjoon-hyun @viirya I would like to merge this if there are no objections from your side.

throw new ParseException(
errorClass = "_LEGACY_ERROR_TEMP_0061",
messageParameters = Map("msg" -> e.getMessage),
ctx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do for re-throws? @MaxGekk @cloud-fan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we check either the exception belongs to an error class or not. If so, just re-throw it as is otherwise assign an error class. But in this PR, I just want to assign error classes and don't want to change existing behaviour.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 22, 2022

Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review.

@MaxGekk MaxGekk closed this in c4a0360 Sep 22, 2022
MaxGekk added a commit that referenced this pull request Oct 12, 2022
### What changes were proposed in this pull request?
In the PR, I propose to use `checkError()` to check the error class, message parameters and the query context. After the PR #37916, all parsing exceptions should have an error class.

### Why are the changes needed?
1. Checking of the error classes plus the query context improves the test coverage.
2. Eliminating the dependency of error messages. So, tech editors will be able to modify `error-classes.json` w/o fixing the test suite.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt "test:testOnly *ErrorParserSuite"
```

Closes #38204 from MaxGekk/intercept-parsing-error-class.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants